In the DOTPLOT folder there is a second program called DTPLT_ED.PRG. This program enables you to change the defaults themselves of the DOTPLOT program. On running the editor you will get a screen looking like:
You will recognize a multi button panel if you see one by now, so I guess this one will not give you too much trouble. The top-left block is for changing the default extensions of DNA and protein files, respectively. If one of these two buttons is clicked upon the program enters the editor mode and you are given the opportunity to edit the old extensions. Likewise you can edit the window- and score-values of the three standard protein score tables, these are all in the elongated box on the left site of the screen. In the middle are two small boxes; one to change the window and score of DNA comparison and the other the Quit option; to stop and leave the program.
The whole of the right of the screen is dedicated to the three extra score tables. Of these the names, windows, scores and comments all can be edited in the same way as with the other tables. An additional option is ``Values''; when this button is clicked upon a new picture will fill your screen:
As you can see it lists the complete set of values of the score table and this will fill allmost halve of your screen with, all but invisible small, lettering. Also present are an exit to return to your previous screen and an edit box. When you enter the second screen, the edit box reports on the Alanine-Alanine couple, and if you look at the table above you will see that the little box on the cross of the A-row and the A-column is indeed in reversed video. To change a value; just type in the new one and it will replace the old one. The corresponding box will switch to normal video and the next one will be activated. If you don't want to change all values but only some there are three ways to activate the box of your choice:
Press <RETURN> and keep it pressed untill you reach the right box.
Use the arrows on your keyboard.
Simply use the mouse to click in the desired box to activate it.
when you are finished click in ``Exit'' and then ``Quit''; all changes will be saved and DOTPLOT will be able to run with new sets of defaults and or a new table.
INFPG5.TXT
To explain this option I will have to tell you something more about the various methods of comparing proteins and their amino acids.
When DNA files are compared the scoring is fairly simple: for every identical base the score is incremented. This method is also available for proteins, but there are other options as well. Some amino acids are chemically more related than others; Glycine (R-H) is nearer to Alanine (R-CH
) than to Cysteine (R-CH
-SH). This fact can be expressed either as a fraction or as equality within a group.
An other approach is to score for evolutionary relatedness. This means two processes have to be considered and expressed in a number. First, the chance of a certain codon mutating into another has to be calculated, and secondly, the fitness of this mutation has to be assessed. Both the chance and the fitness have to be expressed by a single number.
Both the chemical and the evolutionary method have been incorporated into DOTPLOT.
The chemical scoring table is called ``JIMENEZ'' after the man who described it first (1). It does not score for individual amino acids but divides them in groups.
The groups are:
PAGST
QNEDBZ
HKR
All amino acids within the group score equal (=1), between groups they score 0.
The evolutionary approach is represented by ``DAYHOFF'' again named for an important contributor of this work (2), this is a completely individual scoring table. The relatedness of every amino acid with every other amino acid is expressed as a number between 0 and 2.73.
There are three more tables available in DOTPLOT, the contents of which, as well as their names, defaults and comments can all be changed. So if you feel you have developed an improved scoring system you can change one of these tables to fit, complete with an appropriate name and defaults. If you choose to use a scoring table the next step of DOTPLOT is obvious.
INFPG6.TXT
INFPG7.TXT
BSTFT.PI3
PPPPPPPPPPPP
PPPPPP
+f<|8~<<~<
><>f<<<<
+~~~8~~~~>
~~~f~~|~
`fffff`f
pfffff`f
<~ff~f`~
`ff`f``
`ff`f``
+f~`<~~f
~~~~~f~~
+f<`<~<f
|>>>>f>>
f<|~8<<
><>f<<<<
f~~~8|>
~~~f~~|~
`fffff`f
pfffff`f
<~ff~f`~
`ff`f``
`ff`f``
~~~~~f~~
|>>>>f>>
`<<|<
p|~~~
8`fff
~~~`~
|><`>
<0PPPP
`<<|<
v<<<9
p|~~~~~>~
v~~~9
8`fff~
~fff9
~fff9
nff~9
nff`9
fff`9
~~~`~
f~f~9
|><`>
f<f>9PPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
VPPPPPPPPPP
`~>`~>>
x~>~f
`~~`~~~<~~
|~~~f<
`````~``
`````f``
p`|pnfnn
8`|8nfnn
~~~~`~~f~~
|~~`ff
~~|~`|<f<<
x~<`ff
~~#PP
>~f~
`>~~>>
>`x`>~~<
~~f~
<`~~~~~<
~`|`~~~~
``n``
``f``
f`p||pnf
n`f`n
f`8||8nf
n`f`n
f`f`f
f`f`f
f`f`f
f`n`f
~~f~
f~~``~~f
~~|~~~~|
|~f~
f~|``|<f
<~x~<~~6
8<#PPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
FPDTLT.PI3
FIRST.PIC
UUUU@
UUUUY
UUUUY
UUUU]
VDDDY
><<|<~<|
8><<~8<<
~|~~~~>~
8~~~~8|>
``fff
p`f`f
<`f`~
~~~`~
|><`>
d~|<~
f<f|<
f>f~~~~
fff~~
f>fff
f~fff
fffff
nf~ff
|~>f~
f~~>f
<0fn~
~`|ff
|`xff
UU?33
UU730k?
dD330c3
3?0c?
~<~~~
><>>|
UUUU@
UUUUY
UUUU]
UUUU_
VDDDY
VDDD@
><<|<~<|
<|<>>
~|~~~~>~
|~~~~
``fff
`ff``
p`f`f
``fpp
<`f`~
``f<<
~~~`~
~`~~~
|><`>
>`<||
f<f|<
8l<<<~
t~~>~
f>f~~~~
fff~~
f>fff
f~fff
fffff
nf~ff
|~>f~
UUUU`<
UUUU~f<
UUUU
~<~~~
><>>|
UUUU@
UUUUY
UUUUY
UUUU]
VDDD[
VDDD@
><<|<~<|
>8f<>
~|~~~~>~
~8f~~
``fff
p`f`f
<`f`~
US03333
~~~`~
|><`>
33330
t<<|l<
|8<~f|<
~8|~f~~
~<~~~
><>>|
8PHORI.PI3
f<|8~<<~<
<~~<>
`<>f<<<<f
~~~8~~~~>
~~~~~
p~~f~~|~
8fffff`f
fffff`f
~ff~f`~
`ff`f``
`ff`f``
f~`<~~f
~~~~~f~~
f<`<~<f
|>>>>f>>
UUUUU
UUUUW
UUUUU
UUUUW
UUUUU
UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
fff@~f
UUUUU
>~`~<~<
3UUUUW
~~`~~~~
UUUUU
````f
3UUUUW
````f
UUUUU
p|`|`
3UUUUW
8|`|`
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
~~~~~
3UUUUW
|~~~<
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
<|?3?
3UUUUW
>~?3?
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
3UUUUW
UUUUU
UUUUW
UUUUU
UUUUW
UUUUU
UUUUW
PNL.PI3
U>333330UU_
303308*
U3303?0
30300
U330300
?0??0?*
0>UU_
UUUUP
UUUU\a
UUUU\c
eUUUW
UUUU]a
ffmeUUUW
UUUU_
\DDD@
DDDu_
\DDDB
DDDu_
\DDDJ"DDDu_
UUUUT
UUUUW
UUUUU
UUUUW
UUUUU
UUUUW
UUUUT
UUUUW
UUUUT9
UUUUW
\DDDDRDDDu_
DDDMW
\DDDABDDDu_
DDDMW
\DDD@
DDDu_
3?>3?
3??3?
UUUU33
UUUUW
UUUU33
UUUUW
UUUU??
UUUUW
UUUUW
UUUUT
UUUUW
UUUUU
aUUUUW
UUUUU
UUUUW
UUUUU
aUUUUW
`DDDMW
DDDMW
\DDDDDA
\EDDDDA
`<<>><
`>~~~|
`>ffp`
`~ff<`
~~f~~~
~>f>|>
UUUUU@
UUUUW
UUUUUO
UUUUW
UUUUUL
UUUUW
UUUUUG
UUUUW
DDDMW
f<><>
f~~>~
fff>p
f~f~<
|~>~~
US330
REV.PI3
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
,PPPPPPPP
PPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPPPPPPPPP
PPPPIMAG
SECOND.PIC
PPPPPPP
8<0G p
PUUUV
PUUUUp
d$3!
UUUUW
UUUUU\
dQD3/
8<0B.
d22!"
dAr0(
dBr0(@
5UUUUUW
8>???
???0?
(UUIMAG
7PTAFELS.PI3
fff@~f
UUUP+
UUUP+
3????>
333??>
3?333
33333
73?33
UUUP+
UUUP+
TEM.PI3
fff@~f
neutral,weakly hydrophobic
: hydrophilic, acid amine
: hydrophilic, basic
: hydrophobic
hydrophobic, aromatic
: cross-link forming
The various scoring-tables.
The program will ask you what table to use. As you see the tables you can change are called Own1 to Own3. This name can be changed by you into any other, so if you get this program as a copy don't be alarmed if these boxes contain different names. This is also true for a lot of the numbers that are given as default values in the next couple of screens. If they are not the same as described, somebody probably has changed them to fit his/her needs. If you don't agree with these new values or if you don't agree with mine you can change them yourself (see page 10).
After you have chosen your table, or if you have selected to use no table and did not see the above screen, you will get the oppertunity to select the sequence that will be plotted horizontally.
And welcome back to those of you that chose to select DNA instead of protein. The picture you get at this stage is a little different but as it is completely analogous you probably will be able to overcome the discrepancies. So for the rest of the story replace "amino acids" by "bases" and it all will be very straightforward. If the differences get to great I will, of course, explain them to a greater extent. The screen, as can be seen above, is filled with a standard selection box and as soon as you have selected a sequence the next screen will appear and look like:
The first sequence input.
Borders of choosen sequence.
The number between the asterisks is the default value; you can get this value by pressing <RETURN>, or by clicking the left mouse button while putting the arrow in the OK-box. If you want to start at another amino acid just type in the number of the first residue of your selected range and press <RETURN>. The next line will appear at once asking for the last amino acid to be in your selection, the default option (the last residue of the total sequence) is again given between asterisks.
If you are running DNA there will even be a third line asking if you want to use the reverse sequence of the strand on your disk. Another way of doing this is by giving a higher number to start with, than to end with; the program will then automatically reverse the strand. The numbering of the reverse strand will be that of the
whole
sequence; so if you select 100 bases at the beginning of a sequence of 1000, the numbering will run from 901 to 1000 after reversing.
After the horizontal sequence is selected the whole process is repeated for the vertical sequence. The whole procedure is completely analogous to that of the horizontal sequence, so I doubt if there will be any major problems.
The next input that is asked for is the window size. The question is accompanied by its own default values and can be treated in the same way as the question about the borders of the sequences. The window is that small stretch of one sequence that will be compared to
all possible
stretches of the other sequence of exactly the same size. The number of residues that are identical in that stretch are a measure of homology. The threshold score for this window is asked next. In proteins the defaults are: window=8 and score =5; so if 5 out of 8 amino acids are identical then there is enough similarity. For scoring amino acids also see page 6
The result of a run.
As soon as this last information has been given to the program, it starts to calculate the degree of similarity and gives the output shown above. Every stretch of similarity is represented by the diagonal in the left boxed area. The right-hand part of the picture is reserved for the statistics of your run; date, time, what sequences and score table you used but also how long it took for DOTPLOT to generate the picture, in this case just over two seconds. There is also a little box that bears the name "tracks", it probably will be one in most cases, only for very long sequences it will increase. This has to do with the fact that DOTPLOT uses a lot of memory, so much in fact that if you study long sequences there is not enough room in your computer for the program to store all its tables. In this case, several tracks will be laid on your screen to build up the complete picture, and these will be counted. You don't have to worry about this process, everything will be calculated by DOTPLOT itself; as a user you will only notice the way the picture was built and nothing of the hard work.
As soon as the picture as shown above is completed, a panel is slid in from the right, you probably will have trouble getting a good look at the original picture before this happens. But don't worry; it is not lost, it can be brought back.
DOTPLOT is now in a mode were you can use the mouse to communicate directly with the program. When positioned over the panel the mouse is represented by an arrow, when over the picture by a cross. In the latter case, when over the picture, two commands are possible: Zoom-in and Show-homology.
Zoom-in.
You can zoom in any part of the picture, just place the cross on the left hand top corner of the area you want to enlarge, press the
mouse button and drag the mouse (while you keep the button pressed) downwards and to the right. While doing this you will see that the surrounded area is boxed in. As soon as you release the button the boxed area will be blown up to full size. Note that the panel is affected by this; some or all of the lettering of the rectangle marked "Borders" are now fully black.
Show-homology.
If you place the cross directly over a diagonal representing a stretch of homology and press the
RIGHT
mouse button, the program will show the two corresponding sequences around this stretch. This form of output can be printed directly or saved as an ASCII file. This will enable you to incorporate it into other texts using almost any word processor. You can return to the original screen by either using the "EXIT" button or by pressing the
RIGHT
mouse button again.
If you are using protein files and a scoring table you will notice that only perfect matches are indicated by vertical lines, and not the related amino acids. Even if not shown the program
use the scoring table you selected, so do not be alarmed about differences between the graphic and the litteral output. DNA users might notice if they occasionally run RNA against DNA sequences that T=U and this couple will be honored by a vertical line.
All other commands can be entered by use of the panel on the right of the screen. The commands are grouped, and these groups are visualized by placing them in the same box.
Show-homology.
Borders.
These options are all about the borders of that part of the sequences that is shown. The lettering in this box is either black or grey. The button is only active when black, and will not react to you pressing the left mouse button when it is grey. If you did try the Zoom-in option, and selected a small portion of the original picture, all or most will be black.
Shift.The first subgroup of "Borders" is The "shift" option. With the four arrows in this block you can shift the borders of your selection. If you zoomed in and ended up with a picture that runs horizontally from 100 to 200 and now you shift to the right the new borders will be 150 to 250. So the total length remains the same, only the borders shift. The program will
do this straight away, but only after you pressed one of the "Execute" buttons. This allows you to press combinations of shift, i.e. to the right and down. It also makes it possible to combine shift and expand.
Expand. As for "Shift", this option is only active when the lettering is in black. Expand is a kind of zoom-out; you can enlarge the area that is represented in the picture by a factor of two -horizontally and/or vertically- or you can take one or both sequences in their total. The program will
do this straight away, but only after you pressed one of the "Execute" buttons.
Change.
The change block has only two options; you can replace the horizontal or the vertical sequence by another one. The way DOTPLOT asks you for this input is exactly the same as before. It is also possible to use only a part of the sequence and, if it is a DNA file, to reverse the sequence.
Conditions.
There is only one box, but it depends on whether you have chosen DNA or protein what this option is. In case you choose protein it is "Homology" and it allows you to change or activate a score table. In fact you will be asked if you want to use one, so this is also the way to inactivate a score table. For score tables see also pages 6 and 10.
If you are studying DNA sequences the option will read: "Reverse". Using this box will enable you to reverse one or both sequences. As described on page 7, the numbering will be changed! RNA files will be changed into DNA by the reverse process; since DNA and RNA are completely compatible, as far as DOTPLOT is concerned, you don't have to worry about this.
Parameters.
The two most crucial parameters of a DOTPLOT run can be changed within this box. This can be done by clicking in the little arrow-boxes. The innermost will either increment (right) or decrement (left) the value of the window or the score by one. Clicking in the outmost will change the values by 10%+1. The program will
use the altered parameters, until you have pressed one of the "Execute" buttons.
Output.
There are two mayor ways to save the result of your run: 1.) hardcopy on paper or 2.) as a file that can be accessed by other programs.
Print. When you click in one of the two boxes with this heading DOTPLOT will make a hardcopy of the picture and the relevant information about it. It will, in fact, be the picture on page 8. The orientation of the hardcopy can be either in the portrait or the landscape format.
Save. The picture, see top of page 8, can be saved in two formats that are compatible with popular programs. Degas format is not compressed and will use 32034 bytes of memory for every picture. Doodle will take jkhjhfg
The panel.
DNA :Reverse box.
32000 bytes of memory and so, obviously, is not compressed either. The names of these files can be typed in on the prompt and saved on any disk.
Another run.
Here the most extreme commands are grouped. It will take you hours of playful use of DOTPLOT before you will want to use them.
New Run. This one is not so bad. It takes you back to your first real choice: DNA or protein.
Quit. This one is terrible: it does just what the name suggests.